Whenever your inputs are in form of Images/Video Frames we use Convolution neural Network.With help of CNN we can do Image Classification,Object Detection,Face Detection,Object Classification and many more
Human brain is divided into 4 parts . One of the part is Cerebral cortex.
Cerebral Cortex contains
Whenever you see a cat through eyes, It reaches to cerebral coretex via neurons . In Cerebral cortex there is Visual cortex which contains multiple layers (v1,v2,v3,v4,v5.v6).
v1 -- It is responsible for finding edges of image (Ear Edge of cat) v2 -- Some more information of cat will be gathered in these layer also (Is there any other object along with cat) v4 -- Will extract some information from face Similarly all layers v3,v4,v5,v6 will extract some infomation from images . All these information will pass from one layer to another layer.
At final after these extraction of information from image we will be able to recognize the image
When an image is grey scaled , Each pixel will have a value ranging from 0 to 255 . Having 6 6 pixels in total \ When an image is Color , There are three channels R,G,B (When all three are combined we get color image).Each channel will have pixels ranging from 0-255 . Here it will 6 6 * 3 ---> 3 is no of channels.
Now whenever we do normalization on these type of images. It will range from 0-1.
Now assume we have below image which is scaled down between 0-1 (0-white,1-black).Now we know that in Human brain there is Cerebral cortex which contains Visual cortex which has many layers to extract the information from image .
Similarly we will create Filters in CNN.So these layers will act as filters in CNN.
For Example --- V1 layer was responsible for finding the edges , So similar to that we will have VERTICAL edge filter in CNN.
So when we apply these filter on any image, It will be able to identify the Vertical edges in that image.
So whenever we apply Filter (33) on image (66) we get the ouput as (44).So when we apply filter in first 33 part of image . We will do multiplication of each cell of filter with each cell of Image.Simulatneously add output of each cell value to get final ans
Once we have done that we will add final ans to tour output first cell (4*4). Now we will do stride jump of 1. And again calculate the ans ,Similarly we will do stride of 1 in columns also
Now once we have a output of (4*4) , We will do Retransformation of image to get our original values -- By applying min max scaler retransform to the output to get Vertical edge detector
Before
Now all these value will be replave by our Orginal values
Min value 0 - 0 Max value -- 255
O is White Color , 255 is Black color . As we have applied Edge filter -- IN OUTPUT we can clearly see that it is differentiating the White and Black EDGES clearly .
Similarly there will be multiple filters which will extract information from image . Filter like Vertical Edge Detector,Horizontal Edge Detector,FaceDetection filters .
Keras has inbuilt all neccessary filters in it.We just need to specify filter size(3*3) or any other and no of filters.
So whenever the training is started , Weights updation will also take place in Filters values , Like we did in ANN Weigths updation
n = no of pixels in image = 6 , filter size = 3
n - f + 1 = 6 - 3 + 1 = 4
Which is 4*4 matrix output of Convolution operation
Now our ouput is 44 , and our input image wasof 66 -- It means we are loosing some information when we apply Convolution operation, So to avoid that we use Padding Concept
So whenever we perform convolution operation , Output of convolution operation gives matrix of less no of values.It means we are lossing some information when conv operation is performed.
To avoid that , i-e To get same output matrix (66) as that of Input image matrix (66) rather than (4*4) . We will use PADDING in CNN
Now we want 6 as output when convolution operation is performed
n - f + 1 = 6 , n = 6 + f - 1 = 6 + 3 - 1 = 8
So n=8, That means will have to increase the input image from (66) to (88). So we will do padding at all corners in input image , When padding = 1 , We will add 1 row/column in all corner of input image
Most Used techique is Zero Padding .
So when you apply 0 in each cell and apply convolution operation on image with filter , you will get output as 6*6
padding = 1
n + 2p - f + 1 = 6 + (21) - 3 + 1 = 6+2-3+1 = 6 ---- Output (66)
Now in ANN , We know that Once input features are applied , along with weights it goes in Hidden Layer which contains neurons .Neuron consist of two Steps -- Summation of input*weights + bias , Activation Function -----Which will further give us final prediction in output layer , While training we have actual value of that row , We compare it with predicted value using Loss Function. Our aim is to reduce the loss function using an Optimizer . Weights updation takes place in backpropogation through optimizer formula that we have.
In CNN
Convolution operation is first applied to input image with filter to get the Output . Similarly like ANN , In Backpropogation process , we will try to updated the values of filters .
In CNN , Learning has to be done on the Filters .
Once we have the Output of Convolution Operation performed on image , RELU Activation function is applied on the Output of Conv Operation .
Convolution Neural Network is Stacked Horizontally one after the other . i-e If V1 (filter 1) is able to detect the face of cat , These information will be further passed on to V2(filter 2).Now v2 will again detect some more features in face like eyes edges and so on .
All values of Filters will be Randomly selected Initially , But then with Back propogation all values will be updated .
Convolution Operation and RELU ----- 1 Convolution layer , Similarly we will have Multiple convoution layers which will have other filters in it to extract various information like it is been done in V1,V2,V3,V4,V5,V6.... in Human Brain
Suppose in one image , we have multiple cat faces . And we have filters which detects the Faces . When a convolution opertaion is performed between filter and input image . To Localize the faces and identify where are the faces we use Max pooling layer.
Max pooling layer performs two important things
Assume we have Max Pooling Filter of (2*2) with Stride 2 .When we apply these filter on Output of convolution operation .It will take the Higher value from each of the 4 cells of Output of conv operation.If there are no further values in cell .It will take it from the remaining that it has.
Here Maximum Intensity value is picked up from the Output of Convolution operation
These Max Pooling Filter value will also get updated in Back propogation along with Convolution Filters
Data Augmentation will help you to transform the Input images into different different images . i-e Output will be of same category of input image , But Transformation techniques will be applied on the dataset to increase the dataset .
Transformation techniques such as Flipping,Rotation,Shifting,Zooming,Add Noise
Refer Data_Augmentation_cnn.py